Attribute, Event Sequence, and Event Type Similarity Notions for Data Mining

نویسنده

  • Pirjo Moen
چکیده

In data mining and knowledge discovery, similarity between objects is one of the central concepts. A measure of similarity can be user-de ned, but an important problem is de ning similarity on the basis of data. In this thesis we consider three kinds of similarity notions: similarity between binary attributes, similarity between event sequences, and similarity between event types occurring in sequences. Traditional approaches for de ning similarity between two attributes typically consider only the values of those two attributes, not the values of any other attributes in the relation. Such similarity measures are often useful, but unfortunately they cannot describe all important types of similarity. Therefore, we introduce a new attribute similarity measure that takes into account the values of other attributes in the relation. The behavior of the di erent measures of attribute similarity is demonstrated by giving empirical results on two real-life data sets. We also present a simple model for de ning similarity between event sequences. This model is based on the idea that a similarity notion should re ect how much work is needed in transforming an event sequence into another. We formalize this notion as an edit distance between sequences. Then we show how the resulting measure of distance can be e ciently computed using a form of dynamic programming, and also give some experimental results on two real-life data sets. As the third case of similarity notions, we study how similarity between types of events occurring in sequences could be de ned. Intuitively, two

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Attribute Similarity and Event Sequence Similarity in DataMiningPirjo

In data mining and knowledge discovery, similarity between objects is one of the central concepts. A measure of similarity can be user-deened, but an important problem is deening similarity on the basis of data. In this thesis we consider two kinds of similarity notions: similarity between binary valued attributes and between event sequences. Traditional approaches for deening similarity betwee...

متن کامل

Event-Based Similarity Search and its Applications in Business Analytics

............................................................................................................................. 2 Table of contents ................................................................................................................ 3 1 Introduction ................................................................................................................ 6 1.1 ...

متن کامل

Event-driven and Attribute-driven Robustness

Over five decades have passed since the first wave of robust optimization studies conducted by Soyster and Falk. It is outstanding that real-life applications of robust optimization are still swept aside; there is much more potential for investigating the exact nature of uncertainties to obtain intelligent robust models. For this purpose, in this study, we investigate a more refined description...

متن کامل

A Framework for Mining Co-evolving Spatial Events

A spatial co-located event set represents a subset of spatial events whose instances are located in a spatial neighborhood. The discovery of co-evolving spatial event sets involves finding co-located event sets whose spatial prevalence variations over time are similar to a specific query sequence. For example, the frequency of drought and wild fire events in Australia over the last 50 years sho...

متن کامل

Similarity Discovery Techniques in Temporal Data Mining

Temporal data mining (TDM) has been attracting more and more interest from a vast range of domains, from engineering to finance. Similarity discovery technique concentrates on the evolution and development of data, attempting to discover the similarity regularity of dynamic data evolution. The most significant techniques developed in recent researches to deal with similarity discovery in TDM ar...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2000